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Abstract. Symbolic trajectory evaluation (STE) is a model checking 
technique that has been successfully used to verify industrial designs. 
Existing implementations of STE, however, reason at the level of bits, 
allowing signals to take values in {0,1, A}. This limits the amount of 
abstraction that can be achieved, and presents inherent limitations to 
scaling. The main contribution of this paper is to show how much more 
abstract lattices can be derived automatically from RTL descriptions, 
and how a model checker for the general theory of STE instantiated 
with such abstract lattices can be implemented in practice. This gives us 
the first practical word-level STE engine, called STEWord. Experiments 
on a set of designs similar to those used in industry show that STEWord 
scales better than word-level BMC and also bit-level STE. 


1 Introduction 

Symbolic Trajectory Evaluation (STE) is a model checking technique that grew 
out of multi-valued logic simulation on the one hand, and symbolic simulation 
on the other hand [2] . Among various formal verification techniques in use today, 
STE comes closest to functional simulation and is among the most successful for¬ 
mal verifiation techniques used in the industry. In STE, specifications take the 
form of symbolic trajectory formulas that mix Boolean expressions and the tem¬ 
poral next-time operator. The Boolean expressions provide a convenient means 
of describing different operating conditions in a circuit in a compact form. By 
allowing only the most elementary of temporal operators, the class of properties 
that can be expressed is fairly restricted as compared to other temporal logics 
(see [3] for a nice survey). Nonetheless, experience has shown that many impor¬ 
tant aspects of synchronous digital systems at various levels of abstraction can 
be captured using this restricted logic. For example, it is quite adequate for ex¬ 
pressing many of the subtleties of system operation, including clocking schemas, 
pipelining control, as well as complex data computations mm- 

In return for the restricted expressiveness of STE specifications, the STE 
model checking algorithm provides siginificant computational efficiency. As a re¬ 
sult, STE can be applied to much larger designs than any other model checking 
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technique. For example, STE is routinely used in the industry today to carry 
out complete formal input-output verification of designs with several hundred 
thousand latches m- Unfortunately, this still falls short of providing an au¬ 
tomated technique for formally verifying modern system-on-chip designs, and 
there is clearly a need to scale up the capacity of STE even further. 

The first approach that was pursued in this direction was structural decom¬ 
position. In this approach, the user must break down a verihcation task into 
smaller sub-tasks, each involving a distinct STE run. After this, a deductive 
system can be used to reason about the collections of STE runs and verify 
that they together imply the desired property of the overall design [5]. In the¬ 
ory, structural decomposition allows verification of arbitrarily complex designs. 
However, in practice, the difficulty and tedium of breaking down a property into 
small enough sub-properties that can be verified with an STE engine limits the 
usefulness of this approach significantly. In addition, managing the structural 
decomposition in the face of rapidly changing RTL limits the applicability of 
structural decomposition even further. 

A different approach to increase the scale of designs that can be verified is 
to use aggressive abstraction beyond what is provided automatically by cur¬ 
rent STE implementations. If we ensure that our abstract model satisfies the 
requirements of the general theory of STE, then a property that is verified on 
the abstract model holds on the original model as well. Although the general 
theory of STE allows a very general circuit model CO], all STE implementations 
so far have used a three-valued circuit model. Thus, every bit-level signal is al¬ 
lowed to have one of three values: 0, I or X, where X represents “either 0 or I”. 
This limits the amount of abstraction that can be achieved. The main contri¬ 
bution of this paper is to show how much more abstract lattices can be derived 
automatically from RTL descriptions, and how the general theory of STE can 
be instantiated with this lattice to give a practical word-level STE engine that 
provides significant gains in capacity and efficiency on a set of benchmarks. 

Operationally, word-level STE bears similarities with word-level bounded 
model checking (BMC). However, there are important differences, the most sig¬ 
nificant one being the use of X-based abstractions on slices of words, called 
atoms, in word-level STE. This allows a wide range of abstraction possibilities, 
including a combination of user-specified and automatic abstractions - often a 
necessity for complex verification tasks. Our preliminary experimental results 
indicate that by carefully using X-based abstractions in word-level STE, it is 
indeed possible to strike a good balance between accuracy (cautious propagation 
of X) and performance (liberal propagation of X). 

The remainder of the paper is organized as follows. We discuss how words 
in an RTL design can be split into atoms in Section Atoms form the basis 
of abstracting groups of bits. In Section]^ we elaborate on the lattice of values 
that this abstraction generates, and Section presents a new way of encoding 
values of atoms in this lattice. We also discuss how to symbolically simulate 
RTL operators and compute least upper bounds using this encoding. Section 
presents an instantiation of the general theory of STE using the above lattice, and 


discusses an implementation. Experimental results on a set of RTL benchmarks 
are presented in Section and we conclude in Section 


2 Atomizing words 

In bit-level STE mn], every variable is allowed to take values from {0, 1,X}, 
where X denotes “either 0 or 1”. The ordering of information in the values 0, 
1 and X is shown in the lattice in Eig. where a value lower in the order has 
“less information” than one higher up in the order. The element T is added 
to complete the lattice, and represents an unachievable over-constrained value. 
Tools that implement bit-level STE usually use dual-rail encoding to reason 
about ternary values of variables. In dual-rail encoding, every bit-level variable v 
is encoded using two binary variables vg and vi. Intuitively, Vi indicates whether v 
can take the value i, for i in {0,1}. Thus, 0, 1 and X are encoded by the valuations 
(1,0), (0,1) and (1,1), respectively, of (?;o)^'i)- By convention, (woi'^’i) = (0,0) 
denotes T. An undesired consequence of dual-rail encoding is the doubling of 
binary variables in the encoded system. This can pose serious scalability issues 
when verifying designs with wide datapaths, large memories, etc. Attempts to 
scale STE to large designs must therefore raise the level of abstraction beyond 
that of individual bits. 

In principle, one could go to the other extreme, and run 
STE at the level of words as defined in the RTL design. This 
requires defining a lattice of values of words, and instantiating 
the general theory of STE m with this lattice. The difficulty 
with this approach lies in implementing it in practice. The 
lattice of values of an m-bit word, where each bit in the word 
can take values in {0,1, A"}, is of size at least 3™. Symbolically 
representing values from such a large lattice and reasoning 
about them is likely to incur overheads similar to that incurred in bit-level STE. 
Therefore, STE at the level of words (as defined in the RTL design) does not 
appear to be a practical proposition for scaling. 

The idea of splitting words into sub-words for the purpose of simplifying 
analysis is not new (see e.g. HI)- An aggressive approach to splitting (an ex¬ 
treme example being bit-blasting) can lead to proliferation of narrow sub-words, 
making our technique vulnerable to the same scalability problems that arise with 
dual-rail encoding. Therefore, we adopt a more controlled approach to splitting. 
Specifically, we wish to split words in such a way that we can speak of an entire 
sub-word having the value X without having to worry about which individual 
bits in the sub-word have the value X. Towards this end, we partition every 
word in an RTL design into sub-words, which we henceforth call atoms, such 
that every RTL statement (except a few discussed later) that reads or updates 
a word either does so for all bits in an atom, or for no bit in an atom. In other 
words, no RTL statement (except the few discussed at the end of this section) 
reads or updates an atom partially. 



X 


Fig. 1. Ternary 
lattice 




Some details of atomization To formalize the notion of atoms, let w he a 
word of width m in an RTL design C. Let 0 denote the least significant bit 
position and m — 1 denote the most significant bit position of w. For integer 
constants p, q such that 0<p<q<m — 1, we say that the sub-word of w from 
bit position^ to g is a slice of w, and denote it by w[q : p]. let AbsSel(w, q,p) be an 
abstract selection operator that either reads or writes the slice w[q : p]. Concrete 
instances of AbsSel are commonly used in RTL designs, e.g. in the System-Verilog 
statement c [4; 1] = a [10:7] + b [5; 2]. We say that AbsSel (w, g,p) mdwces an 
atomization of w, as shown in Table II,where AtomSu; denotes the set of atoms 
into which w is partitioned. 


Condition 

AtomSu, 

q < m — 1 and p > 0 

{w\m — 1 ■. q-\- 1], w]g : p],w[p —1:0]} 

q < m — 1 and p = 0 

{w[m — l:q-\- 1], wjg : 0]} 

q = m — 1 and p > 0 

{w{m — 1 : p],'w\p —1:0]} 

q = m — 1 and p = 0 

{w[m — 1:0]} 


Table 1. Computing atoms induced by AbsSel(w, g,p) 


Given atomizations Atoms^^^ and Atoms^\ we define their coarsest refine¬ 
ment to be the atomization in which u'[mi : mi] and w[m 2 ■ m 2 ] belong to the 
same atom iff they belong to the same atom in both AtomsJJ^ and Atoms^^. 
For every word w[m — 1 : 0] in the RTL design, we maintain a working set, 
WSetAtomSu,, of atoms. Initially, WSetAtomSu, is initialized to {w[m — 1 : 0]}. 
For every concrete instance of AbsSel applied on w in an RTL statement, we com¬ 
pute AtomSu, using Table II, and determine the coarsest refinement of Atoms^j 
and WSetAtomSu,. The working set WSetAtomSm is then updated to the coarsest 
refinement thus computed. The above process is then repeated for every RTL 
statement in the design. 

The above discussion leads to a fairly straightforward algorithm for identi¬ 
fying atoms in an RTL design. We illustrate this on a simple example below. 
Fig. [^a) shows a System-Verilog code fragment, and Fig. [^b) shows an atom¬ 
ization of words, where the solid vertical bars represent the boundaries of atoms. 
Note that every System-Verilog statement in Fig. [^a) either reads or writes all 
bits in an atom, or no bit in an atom. Since we wish to reason at the granu¬ 
larity of atoms, we must interpret word-level reads and writes in terms of the 
corresponding atom-level reads and writes. This can be done either by modifying 
the RTL, or by taking appropriate care when symbolically simulating the RTL. 
For simplicity of presentation, we show in Fig. j^c) how the code fragment in 
Fig. I^b) would appear if we were to use only the atoms identified in Fig. [^b). 
Note that no statement in the modified RTL updates or reads a slice of an atom. 
However, a statement may be required to read a slice of the result obtained by 
applying an RTL operator to atoms (see, for example, Fig.j^c) where we read a 
slice of the result obtained by adding concatenated atoms). In our implementa- 








reg [3:0] x; 
reg [7:0] y; 
reg [7:0] z; 
reg [3:0] w; 

2[4:1] = X + y[5:2]; 
w = z[3:01 + y[3:0]; 


(a) 



reg [3:0] x; 

reg [1:0] y_l_0; reg [1:0] y_3_2; 
reg [1:0] y_5_4; reg [1:0] y_7_6; 
reg z_0_0; reg [2:0] z_3_l; 
reg 2_4_4; reg [2:0] 2_7_5; (> 
reg [3:0] w; 


z_4_4 = (X 
z_3_l = (X 


{y_5_4 
{y_5_4, 


y_3_2})[3:3]; 

y_3_2})[2:0]; 


= ({2_3_1, 2_0_0} + {y_3_2, y_l_0}); 


tion, we do not modify the RTL. Instead, we symbolically simulate the original 
RTL, but generate the expressions for various atoms that would result from 
simulating the modified RTL. 

Once the boundaries of all atoms 
are determined, we choose to disre¬ 
gard values of atoms in which some 
bits are set to X, and the others are 
set to 0 or 1. This choice is justified 
since all bits in an atom are read or 
written together. Thus, either all bits 
in an atom are considered to have val¬ 
ues in {0,1}, or all of them are consid¬ 
ered to have the value X. This implies 
that values of an m-bit atom can be 
encoded using m -I- 1 bits, instead of 
using 2m bits as in dual-rail encoding. 

Specifically, we can associate an addi¬ 
tional “invalid” bit with every m-bit 
atom. Whenever the “invalid” bit is 
set, all bits in the atom are assumed to have the value X. Otherwise, all bits are 
assumed to have values in {0,1}. We show later in Sections 4.1 and 4.2 how the 
value and invalid bit of an atom can be recursively computed from the values 
and invalid bits of the atoms on which it depends. 

Memories and arrays in an RTL design are usually indexed by variables 
instead of by constants. This makes it difficult to atomize memories and arrays 
statically, and we do not atomize them. Similarly, if a design has a logical shift 
operation, where the amount of shift is specihed by a variable, it is difficult 
to statically identify subwords that are not split by the shift operation. We 
ignore all such RTL operations during atomizaion, and instead use extensional 
arrays m to model and reason about them. Section [4^ discusses the modeling 
of memory/array reads and writes in this manner. 




(c) 


Fig. 2. Illustrating atomization 


3 Lattice of atom values 


Recall that the primary motivation for atomizing words is to identify the right 
granularity at which an entire sub-word (atom) can be assigned the value X 
without worrying about which bits in the sub-word have the value X. Therefore, 

m bits m bits 

an m-bit atom a takes values from the set {0 • • • 00, ... 1 • • • 11, X}, where X is 
a single abstract value that denotes an assignment of X to at least one bit of a. 
Note the conspicuous absence of values like 0X1 • • • 0 in the above set. Fig. |^a) 
shows the lattice of values for a 3-bit atom, ordered by information content. 
The T element is added to complete the lattice, and represents an unachievable 
over-constrained value. Fig. ib) shows the lattice of values of the same atom if 
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Fig. 3. Atom-level and bit-level lattices 


we allow each bit to take values in {0, 1,X}. Clearly, the lattice in Fig. Ufa) is 
shallower and sparser than that in Fig. |^b). 

Consider an m-bit word w that has been partitioned into non-overlapping 
atoms of widths mi,.. .nir, where The lattice of values of w 

is given by the product of r lattices, each corresponding to the values of an 
atom of w. For convenience of representation, we simplify the product lattice by 
collapsing all values that have at least one atom set to T (and therefore represent 
unachievable over-constrained values), to a single T element. It can be verified 
that the height of the product lattice (after the above simplification) is given by 
r -I- 1, the total number of elements in it is given by IlJLi (2™^ -f l) -I- 1 and the 
number of elements at level i from the bottom is given by (T) 11^=1 where 
0 < J < r. It is not hard to see from these expressions that atomization using 
few wide atoms (i.e., small values of r and large values of mj) gives shallow and 
sparse lattices compared to atomization using many narrow atoms (i.e., large 
values of r and small values of mj). The special case of a bit-blasted lattice (see 
Fig. [^b)) is obtained when r = m and mj = I for every j G {1,. •. m}. 

Using a sparse lattice is advantageous in symbolic reasoning since we need 
to encode a small set of values. Using a shallow lattice helps in converging fast 
when computing least upper bounds - an operation that is crucially needed 
when performing symbolic trajectory evaluation. However, making the lattice 
of values sparse and shallow comes at the cost of losing precision of reasoning. 
By atomizing words based on their actual usage in an RTL design, and by 
abstracting values of atoms wherein some bits are set to X and the others are 
set to 0 or 1, we strike a balance between depth and density of the lattice of 
values on one hand, and precision of reasoning on the other. 


4 Symbolic simulation with invalid-bit encoding 


As mentioned earlier, an m-bit atom can be encoded with m + 1 bits by as¬ 
sociating an “invalid bit” with the atom. For notational convenience, we use 
val(a) to denote the value of the m bits constituting atom a, and inv(a) to de¬ 
note the value of its invalid bit. Thus, an m-bit atom a is encoded as a pair 
(val(a), inv(a)), where val(a) is a bit-vector of width m, and inv(a) is of Boolean 
type. Given (val(a), inv(a)), the value of a is given by ite(inv(a), X, val(a)), where 
“ite” denotes the usual “if-then-else” operator. For clarity of exposition, we call 
this encoding “invalid-bit encoding”. Note that invalid-bit encoding differs from 
dual-rail encoding even when m = 1. Specifically, if a 1-bit atom a has the value 
X, we can use either (0,true) or (l,true) for (val(a), inv(a)) in invalid-bit encod¬ 
ing. In contrast, there is a single value, namely (og, oi) = (1,1), that encodes the 
value X of a in dual-rail encoding. We will see in Section [4^ how this degree of 
freedom in invalid-bit encoding of X can be exploited to simplify the symbolic 
simulation of word-level operations on invalid-bit-encoded operands, and also to 
simplify the computation of least upper bounds. 

Symbolic simulation is a key component of symbolic trajectory evaluation. In 
order to symbolically simulate an RTF design in which every atom is invalid-bit 
encoded, we must first determine the semantics of word-level RTF operators with 
respect to invalid-bit encoding. Towards this end, we describe below a generic 
technique for computing the value component of the invalid-bit encoding of the 
result of applying a word-level RTF operator. Subsequently, we discuss how the 
invalid-bit component of the encoding is computed. 


4.1 Symbolically simulating values 

Fet op be a word-level RTF operator of arity k, and let res be the result of 
applying op on ui, U 2 ,... Ufc, i.e., res = op(ui, U 2 ,... Vk)- For each i in {1,... fc}, 
suppose the bit-width of operand Vi is rrii, and suppose the bit-width of res is 
rrires- We assume that each operand is invalid-bit encoded, and we are interested 
in computing the invalid-bit encoding of a specified slice of the result, say res[q : 
p], where 0 < p < q < rures — 1- Let (op) : {0, l}™i x • • • x {0,1}'"'' —>■ {0,1}™'-'" 
denote the RTF semantics of op. For example, if op denotes 32-bit unsigned 
addition, then (op) is the function that takes two 32-bit operands and returns 
their 32-bit unsigned sum. The following lemma states that \ia\{res\q : p]) can 
be computed if we know (op) and val(ui), for every i G {1,... k}. Significantly, 
we do not need inv(ui) for any i G {1,... k} to compute val(res[g : p]). 

Lemma 1. Let v = ((op) (val(z;i), val(w 2 ), ■ • ■ val(u/c))) [q : p]. Then val(res[g : p]) 
is given by v, where res = op(ui, W 2 ) • • ■ Vk)- 

Proof. By definition of invalid-bit encoding, if inv(res [(7 : p]) is true, the value of 
val(res[q : p]) does not matter. Hence, we focus on the case where inv(res [(7 : p]) is 
false. By definition, in this case, res[q : p] has a value in {0, If the invalid 

bits of all operands Vi are false, then ((op)(val('(;i),val(u 2 ), ... val(ufc )))[(7 : p] 


clearly computes the value of val(res [(7 : p]). Otherwise, suppose inv(ui) = true 
for some i G fc}. By definition of invalid-bit encoding, Vi can have any 

value in {0, l}"*d However, since \nM{res[q : p]) is false, it must be the case that 
'jz\[res[q : p]) has a well-defined value in { 0 , 1 }'^“^+^, regardless of what value Vi 
takes in {0,. Therefore, we can set the value of Vi to val(rii) without affecting 
the value of res[q : p]. By repeating this argument for all Vi such that inv(ui) is 
true, we see that ((op) (val(r;i), val(u 2 ),... val(ufc))) [q : p] gives val(res [9 : p]). 

Lemma tells us that when computing val(res[g : p]), we can effectively as¬ 
sume that invalid-bit encoding is not used. This simplifies symbolic simulation 
with invalid-bit encoding significantly. Note that this simplification would not 
have been possible had we not had the freedom to ignore val(res[q : p]) when 
inv(res[q : p]) is true. 

4.2 Symbolically simulating invalid bits 

We now turn to computing inv(res [(7 : p\). Unfortunately, computing \v\\j{res[q : 
p\) precisely is difficult and involves operator-specific functions that are often 
complicated. We therefore choose to approximate inv(res [(7 : p]) in a sound man¬ 
ner with functions that are relatively easy to compute. Specifically, we allow 
\ny[res[q : p]) to evaluate to true (denoting res[q : p] = X) even in cases where 
a careful calculation would have shown that op(ui, U 2 ,... is not X. How¬ 
ever, we never set inv(res[g : p]) to false if any bit in res[q : p] can take the 
value X in a bit-blasted evaluation of res. Striking a fine balance between the 
precision and computational efficiency of the sound approximations is key to 
building a practically useful symbolic simulator using invalid-bit encoding. Our 
experience indicates that simple and sound approximations of \v\y{res[q : p]) can 
often be carefully chosen to serve our purpose. While we have derived templates 
for approximating inv(res[q : p]) for res obtained by applying all word-level 
RTL operators that appear in our benchmarks, we cannot present all of them 
in detail here due to space constraints. We present below a discussion of how 
\ny[res[q : p]) is approximated for a subset of important RTL operators. Impor¬ 
tantly, we use a recursive formulation for computing \v\y{res[q : p]). This allows 
us to recursively compute invalid bits of atoms obtained by applying complex 
sequences of word-level operations to a base set of atoms. 

Word-level addition. Let +m denote an m-bit addition operator. Thus, if a 
and b are m-bit operands, a +m b generates an m-bit sum and a 1 -bit carry. 
Let the carry generated after adding the least significant r bits of the operands 
be denoted carryr. We discuss below how to compute sound approximations of 
\ny[sum[q : p]) and \ny[carryr), where 0 <p<g<m — 1 and 1 < r < m. 

It is easy to see that the value of sum[q : p] is completely determined by 
a[q : p], h[q : p] and carryp. Therefore, we can approximate \ny{sum[q : p]) as 
follows: my{.sum[q : p]) = inv(a[g : p]) V inv( 6 [g : p]) V inv(carrpp) 

To see why the above approximation is sound, note that if all of inv(a [(7 : p]), 
inv( 6[(7 : p]) and \ny{carryp) are false, then a[q : p], b[q : p] and carryp must 


have non-X values. Hence, there is no uncertainty in the value of su7n[q : p] and 
\nM{sum[q : p]) = false. On the other hand, if any of inv(a[g : p], inv(5[q : p]) or 
inv(carrj/p) is true, there is uncertainty in the value of sum[q : p\. 

The computation of inv(carryp) (or inv(carrj/r)) is interesting, and deserves 
special attention. We identify three cases below, and argue that \n\/{carryp) is 
false in each of these cases. In the following, 0 denotes the p-bit constant 00 • • • 0. 

1. If (inv(a[p — 1 : 0]) V inv(&[p — 1 : 0])) = false, then both inv(a[p —1:0]) 
and inv( 6 [p —1:0]) must be false. Therefore, there is no uncertainty in the 
values of either a[p — 1 : 0 ] or 6[p — 1 : 0 ], and \n\/{carryp) = false. 

2. If (^inv(a[p — 1 : 0]) A (val(a[p —1:0]) = 0)), then the least significant p 
bits of val(a) are all 0. Regardless of val( 6 ), it is easy to see that in this case, 
val(carr?/p) = 0 and inv(carrpp) = false. 

3. This is the symmetric counterpart of the case above, i.e., (=inv( 6 [p — 1 : 
0]) A (val(&[p —1:0]) = 0)). 

We now approximate my{carryp) by combining the conditions corresponding to 
the three cases above. In other words, 

inv(carrj/p) = (inv(a[p— 1 : 0])Vinv(6[p — 1 : 0 ]))a 

(inv(a[p — 1 : 0]) V(val(a[p — 1 : 0]) y^O)) A 
(inv( 6 [p— 1 : 0])V(val(5[p — 1 : 0 ]) 7 ^ 0 )) 


Word-level division. Let denote an m-bit division operator; this is among 
the most complicated word-level RTL operators for which we have derived an 
approximation of the invalid bit. If a and b are m-bit operands, a-^m b generates 
an m-bit quotient, say quot, and an m-bit remainder, say rem. We wish to 
compute \m{quot[q : p\) and \ny(rem[q : p]), where 0 < p < q < m—1. We assume 
that if inv(6) is false, then 5 0 ; the case of a^mb with (val(&), inv(6)) = ( 0 , false) 

leads to a “divide-by-zero” exception, and is assumed to be handled separately. 

The following expressions give sound approximations for \ny{quot[q : p]) and 
\ny{rem[q : p]). In these expressions, we assume that z is a non-negative integer 
such that 2 * < val(6) < 2 *+^. 


\ny[quot[q :p]) = ite(inv( 6 ), tempi, temp 2 ), where 

tempi = inv (a) V (val(a[m — 1 : p]) y^ 0) and 

temp 2 = ite(val(6) = 2*, temps, {i < p) V inv(a[m — 1 : p])), where 
temps = (p + z<m — 1) A inv(a[min(g + i,m — 1) : p -I- z])) 
inv(rem [(7 : p]) = inv(6) V ite(val(&) = 2*, (z > p) A inv(a[min(( 7 , z — 1) : p]),z > p) 

Note that the constraint 2* < val( 6 ) < 2®+^ in the above formulation refers 
to a fresh variable z that does not appear in the RTL. We will see later in 
Section that a word-level STE problem is solved by generating a set of word- 
level constraints, every satisfying assignment of which gives a counter-example 


to the verification problem. We add constraints like 2* < val(5) < 2*+^ in the 
above formulation, to the set of word-level constraints generated for an STE 
problem. This ensures that every assignment of i in a counterexample satisfies 
the required constraints on i. 

To see why the above approximations for \x\v{quot[q : p]) and \’nM{rem[q : p]) 
are sound, first consider the case where inv(6) = true. Since we are unsure of 
the value of the divisor, not much can be said about the remainder. So, we set 
\nM{rem[q : p]) to true. The situation is slightly better for the quotient. If we 
know that inv(a) = false, then since the quotient of integer division is never 
larger than the dividend, we can infer that quot[q : p] = 0 if a[m — 1 : p] =0. 
Clearly, in this case \n\/{quot[q : p]) = false. In all other sub-cases of inv(5) = true, 
we set my{quot[q : p]) to true. 

If inv(6) = false, we know that b has a value in {0,1}™, but not 0. Repre¬ 
senting bit vectors by their integer representations, let i € {0,... m — 1} be such 
that 2* < val(5) < 2*+^. We consider two sub-cases below. 

— val(6) = 2* : In this case, a^mb effectively shifts a right by i bit positions, and 
the least signihcant i bits of a forms the remainder. Therefore, ya\{quot[q : p]) 
is a[q + i : p + i] if g -I- i < m — 1, is a[m — I : p -I- i] padded to the left with 
q — m + i + 1 0sifg-|-z > m—I < p + i, and is 0 if p-I-z > m — 1. It follows 
that if p -I- z > m — 1, then ya\{quot[q : p]) = 0 and \ny{quot[q : p]) = false. 
Otherwise, \ny{quot[q : p]) = inv(a[fc : p + i]), where k = imii{q + i,m— 1). it 
is easy to see that val(rern[g : p]) is a[q : p] if z > g, is a[z — I : p] padded with 
q — i + I Os to the left if g > z > p, and is 0 if z < p. By similar reasoning, if 
i < p, then inv(rem[g : p]) = false; otherwise, inv(rem[g : p]) = inv(a[fc : p]), 
where k = min(g, z — I). 

— 2* < val(6) < 2*+^ : In this case, we show below that if z > p, then \nv{quot[q : 
p]) can be approximated by inv(a[m — 1 : p]). If z < p, then inv(rem[g : p]) = 
false. In all other cases, we approximate \nv{quot[q : p]) and inv(reTO[g : p]) 
by true. 

To see why the above approximations are sound, note that val(a) can be 
written as ai • 2^ + a 2 , where ai and 02 are the integer representations of 
a[m — 1 : p] and a[p — I : 0], respectively. Clearly, 0 < 02 < 2^. Considering 
quotients and remainders on division by val(6), suppose ai = ki • val(&) -(- ri 
and a 2 = k 2 ■ val(6) -l-r 2 , where 0 < ri,r 2 < val(&) and ki,k 2 > 0. Suppose 
further that 2^ • ri -|- r 2 = A :3 • val(&) + r^, where 0 < < val(6) and fcs > 0. 

It is an easy exercise to see that the quotient of dividing val(a) by val(&) is 
2^ • fci -I- ^2 -I- fcs, and the remainder is r^. Thus, val(gzzot) = 2^ ■ ki+ k 2 + k^ 
and val(reTO) = ra. We discuss what happens when z > p and z -|- 1 < p. 

• If z > p, then val(&) > 2* > 2^ > 02 . Since val(6) > 02 , we have ^ 2=0 
and r 2 = 02 < 2^. It follows that quot = 2^ • fci -|- ^ 3 . If /cs < 2^, 
then quot[q : p] depends only on fci, which in turn, depends only on 
a[m — 1 : p] and val(&). Therefore, inv(gzzot[g : p]) can be approximated 
by inv(a[TO — I : p]). 

We now show that k^, is indeed strictly less than 2^. Since 2^ • ri -|- r 2 = 
fca • val(6) -f ra, rearranging terms, we get k^ • val(6) — 2^* • ri = r 2 — r^. 



If possible, let + d, where d > 0. Substituting for fcs, we get 

2P ■ (val(5) — ri) + (I• val(&) = r 2 — r^. Since val(6) > ri, the left hand side 
of the above equation is at least as large as 2^, while the right hand side 
is at most r 2 , which, in turn, is less than 2^. This gives a contradiction, 
and therefore, < 2^. 

• If i < p, we have rem = < val(6) < 2*+^ < 2^. Therefore, val(reTO[g : 

p]) = 0, and \n\/{rem[q : p]) = false. 

The above analysis yields the sound approximations for \n\/{quot[q : p]) and 
\n\/{rem[q : p\) discussed above. 


If-then-else statements. Consider a conditional assignment statement “if 
(BoolExpr) then x = Expl; else x = Exp2;”. Symbolically simulating this 
statement gives x = ite(BoolExpr, Expl, Exp2). The following gives a sound ap¬ 
proximation of inv(a;[g : p]). 

inv(a ;[(7 :p]) = ite(inv(BoolExpr), tempi, temp 2 ), where 

tempi = inv(Expl [(7 : p]) V inv(Exp2[(7 : p]) V (val(Expl [(7 : p]) ^ val(Exp2[(7 : p])) 
temp 2 = ite(val(BoolExpr), inv(Expl [(7 : p]), inv(Exp2[(7 : p])) 

To see why the above approximation of inv(a;[q : p]) is sound, let x = 
ite(BoolExpr, Expl, Exp2), where BoolExpr is a boolean expression, and Expl and 
Exp2 are expressions of the same type as x. To compute inv(a:[g : p]), we note that 
if inv(BoolExpr) = false, then inv(x[g : p]) is simply ite(val(BoolExpr), inv(Expl[g : 
p]), inv(Exp2[g : p])). However, if inv(BoolExpr) = true, then the value of BoolExpr 
could be 1 (denoting true) or 0 (denoting false). Interestingly, if both inv(Expl[g : 
p] and inv(Exp2[g : p]) are false (i.e., neither Expl[g : p] nor Exp2[p : p] are X), and 
if val(Expl[g : p]) = val(Exp2[(7 : p]), then regardless of the value of BoolExpr, we 
have inv(a ;[(7 : p]) = false. This is formalized in the approximation for inv(a;[g : p]) 
mentioned above. 


Bit-wise logical operations. Let and Am denote bit-wise negation and 
conjunction operators respectively, for m-bit words. If a, b, c and d are m-bit 
words such that c = -'mO and d = a Am b, it is easy to see that the following give 
sound approximations of inv(c) and inv((i). 

inv(c[q : p]) = inv(a[g :p]) 

inv(c ?[(7 : p]) = (inv(a[g : p]) V inv (&[(7 : p])) A (inv(a [(7 : p]) V (val(a [(7 : p]) ^ 0)) A 
{\m{b[q : p]) V (val(&[g : p]) ^ 0)) 

The invalid bits of other bit-wise logical operators (like disjunction, xor, nor, 
nand, etc.) can be obtained by first expressing them in terms of ^m and Am and 
then using the above approximations. 



Memory/array reads and updates. Let A be a 1-dimenstional array, i be an 
index expression, and x be a variable and Exp be an expression of the base type 
of A. On symbolically simulating the RTL statement “x = A[i] we update 
the value of x to read(A, i), where the read operator is as in the extensional the¬ 
ory of arrays (see [T^] for details). Similarly, on symbolically simulating the RTL 
statement “A [i] = Exp”, we update the value of array A to update(Aorig, i, Exp), 
where Aorig is the (array-typed) expression for A prior to simulating the state¬ 
ment, and the update operator is as in the extensional theory of arrays. 

Since the expression for a variable or array obtained by symbolic simulation 
may now have read and update operators, we must find ways to compute sound 
approximations of the invalid bit for expressions of the form inv(read(A, i )[(7 : p]). 
Note that since A is an array, the symbolic expression for A is either (i) Ainit, i.e. 
the initial value of A at the start of symbolic simulation, or (ii) update(A', i', Exp^) 
for some expressions A', i' and Exp^, where A' has the same array-type as A, i' has 
an index type, and Exp^ has the base type of A. For simplicity of exposition, we 
assume that all arrays are either completely initialized or completely uninitialized 
at the start of symbolic simulation. The invalid bit in case (i) is then easily seen 
to be true if Ainit denotes an uninitialized array, and false otherwise. In case (ii), 
let V denote read (A, i). The invalid bit of v[q : p] can then be approximated as: 

inv(n[g : p]) = inv(i) V inv(i') V ite (val(i) = val(i'), inv(Exp^[g : p]),temp) , where 
temp = inv(read(A', i))^ : p]). 

To see why the above expression gives a sound approximation of inv(ri [(7 : p]), 
note that if either i or i' is X (i.e. the corresponding invalid bit is true), we 
conservatively set inv(read(update(A', i', Exp^), i) to true. If neither i nor i' is X, 
there are two cases to consider. 

— If val(i) = val(i'), then read(update(A', i', Exp'), i) = Exp'. Hence, the required 
invalid bit is inv(Exp '[(7 : p]). 

— If val(i) yf val(i'), then read(update(A', i', Exp'), i) = read(A',i). Hence, the 
required invalid bit is inv(read(A', \)[q : p]). 

If the RTL design has multi-dimensional arrays, we simply treat them as ar¬ 
rays of arrays, and apply the same reasoning as above. For example, if B is 
a two-dimenstional array, the RTL statement “B[i] [j] = Exp;” updates the 
symbolic value of array B to update(Borig, i, update(read(Borig, i), j, Exp)), where 
Borig is the symbolic expression for B prior to simulating the RTL statement. 
Similarly, the RTL statement “x = B[i] [j] ;”updates the symbolic value of x 
to read(read(B, i), j). 


Shift operations. We discuss below the left-shift operation; the case of the 
right-shift operation can be analyzed similarly. A shift operation can specify ei¬ 
ther a constant number of bit positions to shift, or a variable number of positions 
to shift. We analyze these two cases separately since shifting by a variable num¬ 
ber of positions does not allow us to statically identify the operand’s bit-slices of 


interest. In either case, we assume that a left shift operation pads Os in the least 
signficant shifted positions. Let denote a unary left-shift operator of the hrst 
kind, where k is a positive integer constant, and let ^ denote a binary left-shift 
operator of the second kind. Let a, b, c, d be m-bit words such that b = <Cfc a 
and c = a d. For simplicity of presentation, we assume no wrap-around in 
shifting; the case of wrap-around can be analyzed in a similar way. The follow¬ 
ing equations give sound approximations of inv(6[(? : p\) and inv(c[( 3 ' : p]), where 
0<p<q<m — 1. 


inv (&[(7 : p])ite(p > k, inv(a[g — k : p— k]), temp), where 

tempiXe{q > k, inv(a[g — k : 0]), false) (1) 

inv(c [(7 : p])inv(a [(7 : 0]) A (inv((i) V (val(d) < q)) (2) 


4.3 Computing least upper bounds 

Let a = (val(a), inv(a)) and b = (val(6), inv(&)) be invalid-bit encoded elements 
in the lattice of values for an m-bit atom. We define c = lub{a, b) as follows. 

(a) If (^inv(a) A ^inv(&) A (val(a) 7 ^ val(6)), then c = T. 

(b) Otherwise, inv(c) = inv(a) A inv(5) and val(c) = ite(inv(a), val(6), val(a)) (or 
equivalently val(c) = ite(inv(6), val(a), val(6))). 

Note the freedom in defining val(c) in case (b) above. This freedom comes from 
the observation that if inv(c) = true, the value of val(c) is irrelevant. Furthermore, 
if the condition in case (a) is not satisfied and if both inv(a) and inv(6) are false, 
then val(6) = val(c). This allows us to simplify the expression for val(c) on-the-fly 
by replacing it with val(6), if needed. 

5 Word-level STE 

In this section, we briefly review the general theory of STE m instantiated 
to the lattice of values of atoms. An RTL design C consists of inputs, outputs 
and internal words. We treat bit-level signals as 1-bit words, and uniformly talk 
of words. Every input, output and internal word is assumed to be atomized 
as described in Section Every atom of bit-width m takes values from the 
set {0 . .. 2™ — 1, X}, where constant bit-vectors have been represented by their 
integer values. The values themselves are ordered in a lattice as discussed in 
Section Let <m denote the ordering relation and Um denote the lub operator 
in the lattice of values for an m-bit atom. The lattice of values for a word is the 
product of lattices corresponding to every atom in the word. Let A denote the 
collection of all atoms in the design, and let V denote the collection of values 
of all atoms in A. A state of the design is a mapping s : A —>■ VUT such that 
if a S A is an m-bit atom, then s(a) is a value in the set {0,... 2™ — 1, X, T }. 
Let S denote the set of all states of the design. Clearly S forms a lattice - one 
that is isomorphic to the product of lattices corresponding to the atoms in A. 


Given a design C, let Trc : 5 —>■ 5 define the transition function of C. 
Thus, given a state s of C at time t, the next state of the design at time t + 1 
is given by Trc(s). To model the behavior of a design over time, we define a 
sequence of states as a mapping tr : N —?> 5, where N denotes the set of natural 
numbers. A trajectory for a design C is a sequence a such that for all t € N, 
Trc(cr(f)) G a{t + l). Given two sequences cti and cr 2 , we abuse notation and say 
that tTi G 0-2 iff for every t S N, cri(t) G cr 2 {t). 

The general trajectory evaluation logic of Seger and Bryant m can be in¬ 
stantiated to words as follows. A trajectory formula is a formula generated by 
the grammar ip ::= a is val | ip and ip \ P ^ ip \ Nip , where a is an atom of (7, 
val is a non-X, non-T value in the lattice of values for a, and P is a quantifier- 
free formula in the theory of bit-vectors. Formulas like P in the grammar above 
are also called guards in STE parlance. 

Following Seger et al mu, the defining sequence of a trajectory formula ip 
given the assignment p, denoted is defined inductively as follows. Here, b 
denotes an arbitrary m-bit atom in A and t G N. 

- [a is val]‘^(t)(b) = val if t = 0 and both a, b denote the same m-bit atom, 
and is X otherwise. 

- [tpi and V’ 2 ]'^(t)(b) = [V’i]‘^(t)(b) [V' 2 ]‘^(t)(b) 

~ [P — tp]'^{t){h) = [tp]'i’{t){h) ii (j) \= P, and is X otherwise. 

- [A^'!/)]‘^(t)(b) = — l)(b) if t 7 ^ 0, and is X otherwise. 

Similarly, the defining trajectory of tp with respect to a design C, denoted IV’]^ 
can be dehned as follows. 

- WS(o) = Wm 

- Wc(^ + 1) = + 1) u TrcibPlcit)) for every t G N. 

In symbolic trajectory evaluation, we are given an antecedent Ant and a con¬ 
sequent Cons in trajectory evaluation logic. We are also given a quantifier-free 
formula Constr in the theory of bit-vectors with free variables that appear in the 
guards of Ant and/or Cons. We wish to determine if for every assignment p that 
satishes Constr, we have [Cons]"^ G |Ant]p. 

5.1 Implementation 

We have developed a tool called STEWord that uses symbolic simulation with 
invalid-bit encoding and SMT solving to perform STE. Each antecedent and 
consequent tuple has the format {g,a,vexpr,start,end), where g is a. guard, a 
is the name of an atom in the design under verification, vexpr is a symbolic 
expression over constants and guard variables that specihes the value of a, and 
start and end denote time points such that end > start + 1. 

An antecedent tuple {g,a,vexpr,tiA 2 ) specihes that given an assignment p 
of guard variables, ii p \= g, then atom a is assigned the value of expression 
vexpr^ evaluated on satisfying assignments of p, for all time in {ti,.. .^2 — !}• 
If, however, p ^ g, atom a is assigned the value X for all time in {ti ,.. .t 2 — 1}. 




If a is an input atom, the antecedent tuple effectively specifies how it is driven 
from time ti through <2 ~ 1- Using invalid-bit encoding, the above semantics 
is easily implemented by setting inv(a) to -^g and val(a) to vexpr from time ti 
through t 2 ~ 1 - If a is an internal atom, the defining trajectory requires us to 
compute the lub of the value driven by the circuit on a and the value specified by 
the antecedent for a, at every time point in {ti,.. .t 2 — 1}. The value driven by 
the circuit on a at any time is computed by symbolic simulation using invalid- 
bit encoding, as explained in Sections |4.1| and |4.2| The value driven by the 
antecedent can also be invalid-bit encoded, as described above. Therefore, the 
lub can be computed as described in Section 4.3 If the lub is not T, val(a) and 
inv(a) can be set to the value and invalid-bit, respectively, of the lub. In practice, 
we assume that the lub is not T and proceed as above. The conditions under 
which the lub evaluates to T are collected separately, as described below. The 
values of all atoms that are not specified in any antecedent tuple are obtained 
by symbolically simulating the circuit using invalid-bit encoding. 

If the lub computed above evaluates to T, we must set atom a to an unachiev¬ 
able over-constrained value. This is called antecedent failure in STE parlance. 
In our implementation, we collect the constraints (condition for case (a) in Sec¬ 
tion 4.3) under which antecedent failure occurs for every antecedent tuple in a 


set AntFail. Depending on the mode of verification, we do one of the following: 


— If the disjunction of formulas in AntFail is satisfiable, we conclude that there 
is an assignment of guard variables that leads to an antecedent failure. This 
can then be viewed as a failed run of verification. 

— We may also wish to check if [Cons]"^ C [Ant]^ only for assignments 4> that 
do not satisfy any formula in AntFail. In this case, we conjoin the negation 
of every formula in AntFail to obtain a formula, say NoAntFail, that defines 
all assignments </> of interest. 

A consequent tuple (g, a, vexpr, < 1 , < 2 ) specihes that given an assignment 4> 
of guard variables, if ^ 5 , then atom a must have its invalid bit set to false 
and value set to vexpr, evaluated on satisfying assignments of (j), for all time in 
{ti,.. .t 2 — l}. If 0 ^ g, a consequent tuple imposes no requirement on the value 
of atom a. Suppose that at time t, a consequent tuple specifies a guard g and a 
value expression vexpr for an atom a. Suppose further that (val(o), inv(a)) gives 
the invalid-bit encoded value of this atom at time t, as obtained from symbolic 
simulation. Checking whether [Cons](a) C |Ant]p(<)(a) for all assignments 
(j) reduces to checking the validity of the formula (p —>■ (^inv(a) A {vexpr = 
val(a)))). Let us call this formula OKa,t- Let T denote the set of all time points 
specihed in all consequent tuples, and let A denote the set of all atoms of the 
design. The overall verification goal then reduces to checking the validity of the 
formula OK = Ater aeA U we wish to focus only on assignments tp that 

do not cause any antecedent failure, our verification goal is modified to check 
the validity of NoAntFail —> OK. In our implementation, we use Boolector [T], 
a state-of-the-art solver for bit-vectors and the extensional theory of arrays, to 
check the validity (or satisfiability) of all formulas OK generated by STEWord. 




6 Experiments 


We used STEWord to verify properties of a set of System-Verilog word-level 
benchmark designs. Bit-level STE tools are often known to require user-guidance 
with respect to problem decomposition and variable ordering (for BDD based 
tools), when verifying properties of designs with moderate to wide datapaths. 
Similarly, BMC tools need to introduce a fresh variable for each input in each 
time frame when the value of the input is unspecified. Our benchmarks were in¬ 
tended to stress bit-level STE tools, and included designs with control and data¬ 
path logic, where the width of the datapath was parameterized. Our benchmarks 
were also intended to stress BMC tools by providing relatively long sequences 
of inputs that could either be X or a specified symbolic value, depending on a 
symbolic condition. In each case, we verified properties that were satisfied by the 
system and those that were not. For comparative evaluation, we implemented 
word-level bounded model checking as an additional feature of STEWord itself. 
Below, we first give a brief description of each design, followed by a discussion 
of our experiments. 

Design 1: Our first design was a three-stage pipelined circuit that read 
four pairs of fc-bit words in each cycle, computed the absolute difference of 
each pair, and then added the absolute differences with a current running sum. 
Alternatively, if a reset signal was asserted, the pipeline stage that stored the sum 
was reset to the all-zero value, and the addition of absolute differences of pairs of 
inputs started afresh from the next cycle. In order to reduce the stage delays in 
the pipeline, the running sum was stored in a redundant format and carry-save- 
adders were used to perform all additions/subtractions. Only in the final stage 
was the non-redundant result computed. In addition, the design made extensive 
use of clock gating to reduce its dynamic power consumption - a characteristic 
of most modern designs and that significantly complicates formal verification. 
Because of the non-trivial control and clock gating, the STE verification required 
a simple datapath invariant. Furthermore, in order to reduce the complexity in 
specifying the correctness, we broke down the overall verification goal into six 
properties, and verified these properties using several datapath widths. 

Design 2: Our second design was a pipelined serial multiplier that read two 
fc-bit inputs serially from a single fc-bit input port, multiplied them and made the 
result available on a 2fc-bit wide output port in the cycle after the second input 
was read. The entire multiplication cycle was then re-started afresh. By asserting 
and de-asserting special input flags, the control logic allowed the circuit to wait 
indefinitely between reading its first and second inputs, and also between reading 
its second input and making the result available. We verified several properties 
of this circuit, including checking whether the result computed was indeed the 
product of two values read from the inputs, whether the inputs and results were 
correctly stored in intermediate pipeline stages for various sequences of asserting 
and de-asserting of the input flags, etc. In each case, we tried the verification 
runs using different values of the bit-width k. 

Design 3: Our third design was an implementation of the first stage in a typ¬ 
ical digital camera pipeline. The design is fed the output of a single CCD/CMOS 



sensor array whose pixels have different color hlters in front of them in a Bayer 
mosaic pattern [5] . The design takes these values and performs a “de-mosaicing” 
of the image, which basically uses a fairly sophisticated interpolation technique 
(including edge detection) to estimate the missing color values. The challenge 
here was not only verifying the computation, which entailed adding a fairly large 
number of scaled inputs, but also verifying that the correct pixel values were 
used. In fact, most non-STE based formal verification engines will encounter 
difficulty with this design since the final result depends on several hundreds of 
8 -bit quantities. 

Design 4- Our fourth design is a more general version of Design 3, that 
takes as input stream of values from a single sensor with a mosaic filter having 
alternating colors, and produces an interpolated red, green and blue stream as 
output. Here, we verify 36 different locations on the screen, which translates to 
36 different locations in the input stream. Analyzing this example with BMC 
requires providing new inputs every cycle for over 200 cycles, leading to a blow¬ 
up in the number of variables used. 

For each benchmark design, we experimented with a bug-free version, and 
with several buggy versions. For bit-level verification, we used both a BDD-based 
STE tool [11] and propositional SAT based STE tool [9|; specifically, the tool 
Forte was used for bit-level STE. We also ran word-level BMC to verify the same 
properties. 

In all our benchmarks, we found that Forte and STEWord successfully verified 
the properties within a few seconds when the bitwidth was small (8 bits). How¬ 
ever, the running time of Forte increased significantly with increasing bit-width, 
and for bit-widths of 16 and above. Forte could not verify the properties without 
serious user intervention. In contrast, STEWord required practically the same 
time to verify properties of circuits with wide datapaths, as was needed to verify 
properties of the same circuits with narrower datapaths, and required no user 
intervention. In fact, the word-level SMT constraints generated for a circuit with 
a narrow datapath are almost identical to those generated for a circuit with a 
wider datapath, except for the bit-widths of atoms. This is not surprising, since 
once atomization is done, symbolic simulation is agnostic to the widths of var¬ 
ious atoms. An advanced SMT solver like Boolector is often able to exploit the 
word-level structure of the final set of constraints and solve it without resorting 
to bit-blasting. 

The BMC experiments involved adding a fresh variable in each time frame 
when the value of an input was not specified or conditionally specified. This 
resulted in a significant blow-up in the number of additional variables, espe¬ 
cially when we have long sequences of conditionally driven inputs. This in turn 
adversely affected SMT-solving time, causing BMC to timeout in some cases. 

To illustrate how the verification effort with STEWord compared with the 
effort required to verify the same property with a bit-level BDD- or SAT-based 
STE tool, and with word-level BMC, we present a sampling of our observations 
in Table I, where no user intervention was allowed for any tool. Here indicates 
more than 2 hours of running time, and all times are on an Intel Xeon 3GHz CPU, 


using a single core. In the column labeled “Benchmark”, Designi-Pj corresponds 
to verifying property j (from a list of properties) on Design i. The column labeled 
“Word-level latches bits)” gives the number of word-level latches and the 
total number of bits in those latches for a given benchmark. The column labeled 
“Cycles of Simulation” gives the total number of time-frames for which STE and 
BMC was run. The column labeled “Atom Size (largest)” gives the largest size 
of an atom after our atomization step. Clearly, atomization did not bit-blast all 
words, allowing us to reason at the granularity of multi-bit atoms in STEWord. 


Benchmark 

STEWord 

Forte 

(BDD and SAT) 

BMC 

Word-level latches 
(# bits) 

Cycles of 
Simulation 

Atom Size 
(largest) 

Designl-Pl 

2.38s 


3.71s 

14 latches 

12 

31 

(32 bits) 




(235 bits wide) 



Designl-Pl 

2.77s 


4.53s 

14 latches 

12 

64 

(64 bits) 




(463 bits wide) 



Design2-P2 

1.56s 


1.50s 

4 latches 

6 

32 

(16 bits) 




(96 bits wide) 



Design2-P2 

1.65s 


1.52s 

4 latches 

6 

64 

(32 bits) 




(128 bits wide) 



Design3-P3 

24.06s 


- 

54 latches 

124 

16 

(16 bits) 




(787 bits wide) 



Design4-P4 

56.80s 


- 

54 latches 

260 

16 

(16 bits) 




(787 bits wide) 



Design4-P4 

55.65s 


- 

54 latches 

260 

32 

(32 bits) 




(1555 bits wide) 




Table 2. Comparing verification effort (time) with STEWord, Forte and BMC 


Our experiments indicate that when a property is not satisfied by a circuit, 
Boolector finds a counterexample quickly due to powerful search heuristics imple¬ 
mented in modern SMT solvers. BDD-based bit-level STE engines are, however, 
likely to suffer from BDD size explosion in such cases, especially when the bit- 
widths are large. In cases where there are long sequences of conditionally driven 
inputs (e.g., design 4) BMC performs worse compared to STEWord, presumably 
beacause of the added complexity of solving constraints with significantly larger 
number of variables. In other cases, the performance of BMC is comparable 
to that of STEWord. An important observation is that the abstractions intro¬ 
duced by atomization and by approximations of invalid-bit expressions do not 
cause STEWord to produce conservative results in any of our experiments. Thus, 
STEWord strikes a good balance between accuracy and performance. Another 
interesting observation is that for correct designs and properties, SMT solvers 
(all we tried) sometimes fail to verify the correctness (by proving unsatisfiability 
of a formula). This points to the need for further developments in SMT solving, 
particularly for proving unsatisfiability of complex formulas. Overall, our exper¬ 
iments, though limited, show that word-level STE can be beneficial compared 
to both bit-level STE and word-level BMC in real-life verification problems. 

We are currently unable to make the binaries or source of STEWord publicly 
available due to a part of the code being proprietary. A web-based interface to 
STEWord, along with a usage document and the benchmarks reported in this 
paper, is available at http://www.cfdvs.iitb.ac.in/WSTE/ 

















7 Conclusion 


Increasing the level of abstraction from bits to words is a promising approach to 
scaling STE to large designs with wide datapaths. In this paper, we proposed a 
methodology and presented a tool to achieve this automatically. Our approach 
lends itself to a counterexample guided abstraction refinement (CEGAR) frame¬ 
work, where refinement corresponds to reducing the conservativeness in invalid- 
bit expressions, and to splitting existing atoms into finer bit-slices. We intend to 
build a CEGAR-style word-level STE tool as part of future work. 
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